A Review Paper on Voice Isolation Using Artificial Neural Network

Authors: Vikash Kumar, Shivam Kumar Gupta, Harsh Sharma, Uchit Bhadauriya, Chandra Prakash Varma

DOI Link: https://doi.org/10.22214/ijraset.2022.40745

Abstract

Verbal communication been the most versatile and easiest way of communication since ages and nowadays also, it is used by today\'s machines also but every entity has its complications. Since, the commencement of 4G, hackneyed the verbal communication via mobiles. Thus, experiencing distortion in the message travelling between sender and receiver. So, the concerned paper shows how a new classification method tackles this problem by using Artificial Neural Networks(ANN) with the help of machine learning algorithms. The scope of artificial neural network is very vast it can be used in various fields such as speech enhancement, speech identification and many more. By using artificial neural network with machine learning algorithms, isolation will be done from the noise which got superimposed while travelling through the channel.

Introduction

I. INTRODUCTION

Verbal communication been the most efficient and the easiest way to communicate with other human beings. Since the commencement of 4G has skyrocketed the verbal communication manifolds, before 4G the number of mobile subscribers were around 60 million and after this figure shot up to 1.18 billion and still increasing. But the increasing consumption has given rise to some problems such as poor voice calling quality, e-waste management, high import bills and many more. This paper explains the solution to one of the many problems discussed earlier and that is due to overuse of voice calling sometimes the noise gets intermixed with the sender’s original message making it sometimes difficult for the receiver to understand the message signal. So, the paper has discussed a new way of isolation using Artificial Neural Networks with the help of machine learning algorithms. Artificial Neural Networks (ANN) has many applications such as in dictation method in software's, as a translator etc. Initially, ANN and various machine learning algorithms will be provided good data for the learning purpose to increase its efficiency. Later on, some sources with number of receivers, microphones will be taken for sampling at each interval. The result obtained will then be compared with the given frequency, and activation function will be applied over the output. The ANN and the algorithms which had been taught earlier will now be used here to give us the desired output.

II. LITERATURE SURVEY

The neural network is fundamentally built to imitate the activity of the human brain. The experts reveal the deep neural network as the frame work that is composed of three layers that is the input, output and the hidden layer that is usually layered in between the input and the output layer. The deep neural network is based on the concept of deep learning that is the sub section of the machine learning and utilizes the facets of the artificial intelligence to classify and order the data’s, the following section provides the deep neural network architectures employed in the various areas providing a state of art accuracy.

All the inputs are given to the model with the help of input layer. The condition for which we are training the neural network should be represented in the input layer. Each input should show some independent variable so that they can have an effect on the output of network. The data on which activation function is applied are collected and forms this hidden layer which lies in the middle of input layer and output layer. It does the processing achieved by the previous layer. A neural network can be consisted of many hidden layer depends upon how the complex the problem is, if it can be separated linearly then it means activation function can be implemented to input layer and thus no hidden layer is required, whereas if the decisions to be made is complex then 3 to 5 hidden layers can be used. Output layer makes data available after it has been processed. It collects and transmit data in the designed way. The pattern that the output layer tells can help one trace is route back to the input layer. This phenomena was referred to be the ability to select and isolate one source of audio in a noisy environment from the others so that it can be listened to efficiently. To solve this problem numerous efforts have been made in many fields be it neurobiology, physiology, computer science or engineering.

Thus its nature can be considered to be multidisciplinary, as per the survey done in the year 2002 on the title speech separation by simulating the efforts with the neural network it was clear that for the processing of audio signals they used LPC analysis. This was successfully used for the separation of speech when the interface has broadband noises. There the neural network used was modified in its hidden layer using the extracted data. This data was extracted using LPC algorithm. But the drawback of continuing this approach was the lamination of the samples of audio signals that were supposed to be train. That is why we are working upon this particular paper so that we can exceed its limit of training samples to infinity.

Another publisher Josh McDermott in his article wrote how we can recover different sound sources from repetitions and different mixed signals. They tried to do this with the help of naturalistic properties but this too came with a drawback of how prior knowledge of different sounds were required before isolating them. Thus for this reason we started working upon the unsupervised learning where prior knowledge of the data is not required.

Later on we studied how or in what ways we can isolate the noises using more efficient algorithms under “artificial neural network”. ANN is a technology that was inspired by biological neural network. It is the platform where we can process all the data inputs and make several algorithms work together. Firstly we look into the algorithms we are going to work. Thus different algorithms are being applied in the project. They are isolating and optimizing our signals based upon their amplitude and frequency. We give a short glace on algorithm which working. ICA that is the analysis we are doing on each component using its frequency and amplitude so as to separate them. Activation function gives a desired range to the signals being separated. Many other algorithms simultaneously working to give the desired result. According to IEEE, The neural network is used as a speaker recognition system to control the iterative filter. The neural network is a modified perceptron with a hidden layer using feature data extracted from LPC analysis. The proposed technique has been successfully used for speech separation when the interference is competing speech or broad band noise. After researching from various sources the final solution to this problem can only be given by involving artificial neural network in it with its main algorithm of ICA. Independent Component Analysis is done to solve the problem of signal separation from linear mixture. Thus when we are supposed to separate signals on the basis of very little information that is when the usage of this algorithm comes into existence. This whole system works on simple assumption that is to assume there are N number of independent signals that are used with a matrix with mixed signals. Speech separation effect on neural networks is invented in 2002 by using LPC analysis used in audio signal processing. The recovering sources from embedded repetition is invented in 2011 by using generative model to synthesize sounds with natural property. Tracing and recognizing the speech of a speaker is invented in 2015 by using automatic speech recognition algorithm.

III. ARTIFICIAL NEURAL NETWORKS

Artificial neural networks are simply inspired by biological neural networks of animal brains. Artificial neural networks are basically a collection of interconnected nodes that are called artificial neurons or nodes. Each artificial neuron receives a signal and processes it and can also transmit the signal to other neurons connected to it. The connection by which artificial neurons are connected are called edges. All neurons and edges are assigned a weight that adjusts accordingly as the learning continues. The Artificial Neural Network is broadly divided into three layers input layer, hidden layer and output layer. Each node of input layer is connected to every node of hidden layer. Similarly, each node of hidden layer is connected to every node of output layer.

All the inputs are given to the input layer of the model. Each input should show some independent variable so they can have effect on the output of network. The input layer sends the data to hidden layer, where all the processes are performed. Based on the complexity of the problem, there can be multiple hidden layers. After processing, the data is made available at the output layer.

IV. INDEPENDENT COMPONENT ANALYSIS

Independent Component Analysis is a machine learning algorithm that separates independent sources from a mixed signal. In this algorithm, it is assumed that the sources are potentially, non-Gaussian signals and are statistically independent from each other. ICA is a special case of blind source separation, means we know very little of the original source and can make little assumptions about it. For implementation of ICA if M sources are present then at least M observations are needed to recover the original signals.

V. ACTIVATION FUNCTION

In artificial neural network, we can define the activation function as a function which defines the output of a node when we give set of input signals to it. Then this output is used as input for the next node and in this way this cycle goes on till we find a proper solution to our problem. The main purpose of the activation function is to convert the input signal of a node in an artificial neural network to an output signal which further becomes input for next layer nodes. Basically activation function maps the resulting value into the range in between 0 and 1 or -1 and 1 depending upon the function we are using in our program. In an ANN what we do is, the sum of product of input and their weights and apply the activation function to feed the output to further layers as input.

Activation function plays a vital role in ANN. If we don’t apply activation function then the output will be linear function, which is also not incorrect. But the problem is that the linear function are easy to solve but are limited in complexity. A linear function does not perform well most times.

VI. WORK SYSTEM

Voice Isolation is done by making efficient use of Independent Component Analysis Algorithm and Gradient Descent Algorithm. The methodology of this system can be divided into five steps. First, M sources are taken as input. In second step, these M sources are to be recorded at discrete interval of time by M distinct microphones. Then, the amplitude of the sound will be recorded and it will be compared with pitch of recorded sounds with the given frequencies and after that Activation function will be applied. In the fourth step, the data will be trained using epoch (The term epoch means training the neural network with all the training data for only one cycle.) to attain best result. After that ICA Algorithm and Mathematical conditions will be applied. Then, It checks a condition, if epoch is less than number of iterations. If the condition is true, new random parameters are passed along with input signals and ICA Algorithm is applied again. This process goes on until the condition is false. In the final step, this trained data will be used to separate the voice from mixed signals and generate different audio files.

Conclusion

In this review paper the basics of voice isolation using artificial neural network has been discussed. ANN model tries to replicate complex system behavior, patterns and are able to learn through experience, providing many possibilities for their general use. ANN are one of the promises for the future computing. This paper shows that they can be very useful in voice isolation. they operate more similarly to human brain than a conventional computer logic. Different types of ANN are shortly discussed in this paper and different type of algorithms are discussed. Voice isolation fascinate many scientists and has created technological influence on society. Hope this paper help to understanding the basic of ANN and inspire the research group working on automatic voice isolation. The future of this technology is very bright and the whole key lies in hardware development as ANN need faster hardware.

References

[1] Dr. Kavita, Jitendra Joshi, A Review: Speech Reorganization by Using Artificial Neural Network (https://www.researchgate.net/publication/331275651), 2019. [2] Bhushan C. Kamble “speech recognition using artificial neural network –A review”, international journal of computing and instrumentation Engg.(IJCCIE) Vol. 3, issue 1,2016. [3] Vishnu D. Asal and R. I. Patel, “A review on prediction of EDM parameter using artificial neural network”, International Journal of Scientific Research, Vol.2(3), 2013 [4] Sonali B. Maind, Priyanka Wanker, \"Research Paper on Basic of ANN”, International journal on Recent and Innovation Trends in Computing and Communication, Vol. 1, PP. 96-100,2012. [5] Wounter Gevart Georgi Tsenov,Valeri Mladenov, “Neural Network used for speech Recognition\", journal of Automatic Control, University of Belgrade, vol 20, PP.1-7,2010. [6] Vimal Krishnan VR,Athulya Jayakumar, Babu Anto P, “Speech Recognition of Isolated Malayalam words using Wavelet features and Artificial Neural Network”, 4th IEEE International Symposium on Electronic Design, Test and Application ,2008. [7] Zou J., Han Y., So SS. Overview of Artificial Neural Network. In: Livingstone D.J.(Eds)Artificial Neural Network. Method in Molecular BiologyTM ,vol.458., 2008. [8] Yaswanth H, Harish Mahendrakar and Suman Davia,” Automatic Speech Recognition using Audio visual Cues”, IEEE India Annual Conference PP.166-169,2004.. [9] Kesheng Wang, Hirpa L. Gelgele, Yi Wang, Qingfeng Yuan and Minglung Fang, “A hybrid intelligent method for modelling the EDM process”, International Journal of Machine Tools & Manufacture, Vol. 43, pp.995–999, 2003. [10] Lorenzo Vecci, Francesco Piazza and Aurelio Uncini, \"Learning and Approximation Capabilities of Adaptive Spline Activation Function Neural Net- works\", Neural Networks, Vol.11, No.2, pp 259-270, March 1998. [11] M.R. Ashouri, \"Isolated word recognition using high-order statistics and time delay neural network”, IEEE signal processing workshop on High order statistics ,1997. [12] https: //ieeexplore.org> voice output extraction by signal separation using deep neural network. [13] Independent Component Analysis, a new concept?” research done by Pierre Comon,1994 (https://www.cs.purdue.edu/homes/dgleich/projects/pca_neural_nets_website/) [14] https://medium.com/analytics-vidhya/https-medium-com-types-ofactivation-functions-in-neural-netwo

Copyright

Copyright © 2022 Vikash Kumar, Shivam Kumar Gupta, Harsh Sharma, Uchit Bhadauriya, Chandra Prakash Varma. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET40745

Publish Date : 2022-03-11

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here